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Let (X, dx) be an n-point metric space. We show that there exists a distribution & over non-contractive 
, embeddings into trees f : X ^ T such that for every x e X, 
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dT(f{x),f(y)) 

max 



yeX\{x\ dxix, y) 



< C(\ogn)\ 



where C is a universal constant. Conversely we show that the above quadratic dependence on log n 
■ cannot be improved in general. Such embeddings, which we call maximum gradient embeddings, yield 

jyp^ I a framework for the design of approximation algorithms for a wide range of clustering problems with 

O ■ monotone costs, including fault-tolerant versions of ^-median and facility location. 

> ■ 1 Introduction 

a^ ■ 

5 ' Metric embeddings are an invaluable tool in analysis, Riemannian geometry, group theory, graph theory, 

\^ • and the design of approximation algorithms. In most cases embeddings are used to "simplify" a geometric 

O , object that we wish to understand, or on which we need to preform certain algorithmic tasks. Thus one tries 

to faithfully represent a metric space as a subset of another space with controlled geometry, whose structure 
^ is well enough understood to successfully address the problem at hand. There is some obvious flexibility in 

O . this approach: Both the choice of target space and the notion of faithfulness of an embedding can be adapted 

to the problem that we wish to solve. Of course, once these choices are made, the main difficulty is the 
construction of the required embedding, and in the algorithmic context we have the additional requirement 
that the embedding can be computed efficiently. 
5^ I In this paper we introduce a new notion of embedding, called maximum gradient embeddings, which 

turns out to be perfectly suited for approximating a wide range of clustering problems. We then provide op- 
timal maximum gradient embeddings of general finite metric spaces, and use them to design approximation 
algorithms for several clustering problems. These embeddings yield a generic approach to many problems, 
and we give some examples that illustrate this fact. 

Due to their special structure, it is natural to try to embed metric spaces into trees. This is especially 
important for algorithmic purposes, as many hard problems are tractable on trees. Unfortunately, this is too 
much to hope for in the bi-Lipschitz category: As shown by Rabinovich and Raz [35] the «-cycle incurs 
distortion Q.{n) in any embedding into a tree. However, one can relax this idea and look for a random 
embedding into a tree which is faithful on average. 

Randomized embeddings into trees via mappings which do not contract distances (also known as prob- 
abilistic embeddings into dominating trees) became an important algorithmic paradigm due to the work of 
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Bartal EIH (see also |[il[l6l for the related problem of embedding graphs into distributions over spanning 
trees). This work led to the design of many approximation algorithms for a wide range of NP hard prob- 
lems. In some cases the best known approximation factors are due to the "probabilistic tree" approach, 
while in other cases improved algorithms have been subsequently found after the original application of 
probabilistic embeddings was discovered. But, in both cases it is clear that the strength of Hartal's approach 
is that it is generic: For a certain type of problem one can quickly get a polylogarithmic approximation 
using probabilistic embedding into trees, and then proceed to analyze certain particular cases if one desires 
to find better approximation guarantees. However, probabilistic embeddings into trees do not always work. 
In Q Bartal and Mendel introduced the weaker notion of multi-embeddings, and used it to design improved 
algorithms for special classes of metric spaces. Here we strengthen this notion to maximum gradient embed- 
dings, yielding a faithfulness measure which is nevertheless weaker than bi-Lipschitz, and use it to design 
approximation algorithms for harder problems to which regular probabilistic embeddings do not apply. 

Let (X, dx) and (Y, dy) be metric spaces, and fix a mapping f : X ^ Y. We shall say that / is non- 
contractive if for every x,y € X we. have dY{f{x),f{y)) > dxix,y). The maximum gradient of / at a point 
;c € X is defined as 

|V/(x)|oo - sup — . (1) 

yeX\{x] dx(.X, y) 

Thus the Lipschitz constant of / is given by 

||/||Lip = SUp|V/(x)U. 

xeX 

Note that in the mathematical literature, mostly in the context of the study of isoperimetry on general 
geodesic metric measure spaces (see for example liSiCSil). it is common to define the modulus of the gradient 
of f dXxeX d& 

I V/(x)| - hm sup — — . (2) 

y^x dx{x,y) 

The definition in Q is very natural in the context of connected metric spaces, but in the context of finite 
metric spaces it clearly makes more sense to deal with the maximum gradient as defined in ([T]). 

In what follows when we refer to a tree metric we mean the shortest-path metric on a graph-theoretical 
tree with weighted edges. Recall that (U,du) is an ultrametric if for every u,v,w € U we have du(u,v) < 
ma.x[du{u,w),du(w,v)}. It is well known that ultrametrics are tree metrics. The following result is due to 
Fakcharoenphol, Rao and Talwar [17 J, and is a slight improvement over an earlier theorem of Bartal L4J. For 
every ?i-point metric space (X, dx) there is a distribution over non-contractive embeddings into ultramet- 
rics f : X ^ U such that 

„ duif{x),f(y) 

max — — — = 0{logn). (3) 

x,yeX dxix, y) 

x+y 

The logarithmic upper bound in ^ cannot be improved in general. 

Inequality ^ is extremely useful for optimization problems whose objective function is linear in the 
distances, since by linearity of expectation it reduces such tasks to trees, with only a logarithmic loss in the 
approximation guarantee. When it comes to non-linear problems, the use of Q is very limited. We will 
show that this issue can be addressed using the following theorem, which is our main result. 
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Theorem 1. Let {X, dx) be an n-point metric space. Then there exists a distribution over non-contractive 
embeddings into ultrametrics f : X ^ U (thus both the ultrametric {U,du) and the mapping f are random) 
such that for every x € X, 

E^|V/WU <C(log«)2, 

where C is a universal constant. 

On the other hand there exists a universal constant c > and arbitrarily large n-point metric spaces 
such that for any distribution over non-contractive embeddings into trees f : ^ T there is necessarily 
some X € Yn for which 

Ei^|V/(x)U >c(log«)2. 

We call embeddings as in Theorem [H i.e. embeddings with small expected maximum gradient, maxi- 
mum gradient embeddings into distributions over trees (in what follows we will only deal with distributions 
over trees, so we will drop the last part of this title when referring to the embedding, without creating any 
ambiguity). The proof of the upper bound in Theorem[T]is a modification of an argument of Fakcharoenphol, 
Rao and Talwar [17|, which is based on ideas from |3, llj. It uses the same stochastic decomposition of 
metric spaces as in 1 17 1, but it relies on properties of it which are well known to experts, yet have not been 
exploited in full strength in previous applications. The argument appears in Section |2] Alternative proofs of 
the main technical step of the proof of the upper bound in Theorem [J can be also deduced from the results 
of ||321 or an argument in the proof of Lemma 2. 1 in [20 1. In both of these references the required inequality 
is deduced from an improved analysis of the specific stochastic decomposition of Calinescu, Karloff and 
Rabani ifTTl that was used in lIlTl . Here we present a different approach, which shows that the "padding 
inequality" proved by Fakcharoenphol, Rao and Talwar in [17] can be used as a "black box" to yield a max- 
imum gradient embedding, and there is no need to recall how the stochastic decomposition was originally 
defined. 

The heart of this paper is the lower bound in Theorem [T] The metrics F„ in Theorem [T] are the diamond 
graphs of Newman and Rabinovich ||34]| . which will be defined in Section [3i These graphs have been 
previously used as counter-examples in several embedding problems — see ifTOl [2111291 [34 1. In particular, 
we were inspired to consider these examples by the proof in [21] of the fact that they require distortion 
Q(log n) in any probabilistic embedding into trees. However, our proof of the Q((log n)^) lower bound in 
Theorem[T]is considerably more delicate than the proof in I2TI . This proof, together with other lower bounds 
for maximum gradient embeddings, is presented in Section [3l 

1.1 A framework for clustering problems with monotone costs 

We now turn to some algorithmic applications of Theorem [T] The general reduction in Theorem |2] below 
should also be viewed as an explanation why maximum gradient embeddings are so natural — they are 
precisely the notion of embedding which allows such reductions to go through. 

A general setting of the clustering problem is as follows. Let X be an n-point set, and denote by MET{X) 
the set of all metrics on X. A possible clustering solution consists of sets of the form {{x\,Ci), . . . , (x^, Ck)] 
where xi, . . . , Xk € X and Ci, . . . ,Ck Q X. We think of Ci, . . . , Q as the clusters, and Xi as the "center" of C,-. 
In this general framework we do not require that the clusters cover X, or that they are pairwise disjoint, or 
that they contain their centers. Thus the space of possible clustering solution is S := 2^^^ (though the exact 
structure of S does not play a role in the proof of Theorem |2] below). Assume that for every point x e X, 
every metric d e MET(X), and every possible clustering solution P e S, we are given r(x,d,P) e [0, oo], 
which we think of as a measure of the dissatisfaction of x with respect to P and d. Our goal is to minimize 
the average dissatisfaction of the points of X. Formally, given a measure of dissatisfaction (which we also 
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call in what follows a clustering cost function) Y : Xx MET(X) x S ^ [0, oo], we wish to compute for a 
given metric d € MET(X) the value 



Optr(X, d) min |^ Y{x, d,P):PeS^ 



(Since we are mainly concerned with the algorithmic aspect of this problem, we assume from now on that F 
can be computed efficiently.) 

We make two natural assumptions on the cost function T. First of all, we will assume that it scales 
homogeneously with respect to the metric, i.e. for every A > 0, x e X, d e MET(X) and P € .S we have 
Y{x, Ad, P) - AY{x, d, P). Secondly we will assume that F is monotone with respecting to the metric, i.e. 
if d,d e MET(X) and x e X satisfy d{x,y) < l{x,y) for every y e X then Y{x,d,P) < Y{x,l,P). In other 
words, if all the points in X are further with respect to d from x then they are with respect to d, then x is 
more dissatisfied. This is a very natural assumption to make, as most clustering problems look for clusters 
which are small in various (metric) senses. We call clustering problems with F satisfying these assumptions 
monotone clustering problems. Essentially all the algorithmic minimization problems that have benefitted 
from an application of Q can be cast as monotone clustering problems, but this framework also applies to 
some "non-linear" clustering optimization problems, as we shall see presently. 

The following theorem is a simple application of Theorem[T] It shows that it is enough to solve monotone 
clustering problems on ultrametrics, with only a polylogarithmic loss in the approximation factor. 

Theorem 2 (reduction to ultrametrics). Let X be an n-point set and fix a homogeneous monotone clustering 
cost fiinction F : X X MET(X) x <S ^ [0, oo]. Assume that there is a randomized polynomial time algorithm 
which approximates Optp(X, p) to within a factor a{n) on any ultrametric p e MET{X). Then there is a 
randomized polynomial time algorithm which approximates Optp(X, d) on any metric d € MET{X) to within 
a factor of O ^Q'(?i)(log n)^^ 

Proof. Let (X, d) be an n-point metric space and let & be the distribution over random ultrametrics p on X 
from Theorem [U (which is computable in polynomial time, as follows directly from our proof of Theorem [T] 
in Section |2]). In other words, p{x,y) > d{x,y) for all x,y € X and 

pix,y) 2 
max max < C(log«) . 

xeX yeX\{x\ d{x,y) 

Let P e tS be a clustering solution for which 

Optr(X,J) ^^Y{x,d,P). 

xeX 

Using the monotonicity and homogeneity of F we see that 



Optr(X,p) < 2]f(x,p,P)<2]f(x, 

xeX xeX ^ 

Taking expectation we conclude that 



pix,y) 

max 

yeX\[x} d{x,y) 



■d. 



X€X 



pix,y) 

max 

yeX\lx] d{x, y) 



Y{x, d, P). 



E^Optr(X,p)< J](e^ 



xeX 



max 



pix,y) 



yeX\lx] d{x,y) 



Y{x, d, P) < C(log nf ■ Optr(X, d). 
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Hence, with probability at least ^ we have 

Optr(X,p) < 2C{lognf ■ Opt^{X,d). 
For such p compute a clustering solution Q € S satisfying 

2] T{x,p, Q) < a{n)Opt^{X,p) < 2Ca{n){\og nf ■ Opt^CX, d). 

xeX 

Since p > dit remains to use the mono tonicity of T once more to deduce that 

2] r(x,p, Q)>Y, r(^' 2) > Optr(X, d). 

xeX xeX 

Thus 2 is a O ^a(n)(log n)^^ approximate solution to the clustering problem on X with cost F. □ 

Theorem |2] is a generic reduction, and in many particular cases it might be possible use a case-specific 
analysis to improve the C?^(log?i)^^ loss in the approximation factor. However, as a general reduction 
paradigm for clustering problems, Theorem|2]niakes it clear why maximum gradient embeddings are natural. 

We shall now demonstrate the applicability of the monotone clustering framework to two concrete ex- 
amples called fault-tolerant k-median clustering and 'Lip clustering. We are not aware of a previous inves- 
tigation of these problems, but we believe that they are quite natural. It also seems plausible that, just as in 
the problems for which Bartal's method originally yielded the first non-trivial algorithmic results, a better 
approximation factor might be obtainable via more problem-specific tools. 

Fault-tolerant /c-median and facility location. The ^-median problem is as follows. Given an n-point 
metric space {X, dx) and ^ e N, find xi , . . . , € X that minimize the objective function 

V min dx{x,Xj). (4) 

This very natural and well studied problem can be easily cast as monotone clustering problem by defining 
F(x, d, {(xi , Ci), . . . , (x„,. Cm)}) to be oo if m 7^ k, and otherwise 

r{x,d,{{x\,Ci), . . . ,(x,„,Cm)}) = min d{x,Xj). 

je{xi,...,Xk] 

The linear structure of ^ makes it a prime example of a problem which can be approximated using 
Bartal's probabilistic embeddings. Indeed, the first non-trivial approximation algorithm for ^-median clus- 
tering was obtained by Bartal in lH (another such example is Min-Sum clustering — see ||5l). Since then 
this problem has been investigated extensively: The first constant factor approximation for it was obtained 
in |[T31 using LP rounding, and the first combinatorial (primal-dual) constant-factor algorithm was obtained 
in 1241 . In Q an analysis of a natural local search heuristic yields the best known approximation factor for 
^:-median clustering. 

Here we study the following fault-tolerant version of the ^-median problem. Let {X, d) be an n-point 
metric space and fix ^ € N. Assume that for every x € X we are given an integer j{x) e X (which we 
call the fault-tolerant parameter of x). Given xi, . . . , X)t and x € X let x*(x; d) be the j-th closest point to 
X in {xi, . . . ,X/t). In other words, {x*{x; d)}^._, is a re-ordering of {x;)*:_, such that d{x, x*Ax; d)) < • • • < 
d{x, x^(x; d)). Our goal is to minimize the objective function 

^d(x,x*(^)(x;(i)). (5) 

xeX 
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To understand ^ assume for the sake of simplicity that j{x) = j for all x € X. If [xj}'^._, minimize ^ 
and 7 - 1 of them are deleted (due to possible noise), then we are still ensured that on average every point 
in X is close to one of the xj. In this sense the clustering problem in ^ is fault-tolerant. In other words, the 
optimum solution of ^ is insensitive to (controlled) noise. Observe that for 7 = 1 we return to the ^-median 
clustering problem. 

We remark that another fault-tolerant version of ^-median clustering was introduced in |[25l. In this 
problem we connect each point x in the metric space X to j{x) centers, but the objective function is the 
sum over x e X of the sum of the distances from x to all the j{x) centers. Once again, the linearity of the 
objective function seems to make the problem easier, and in ll37l a constant factor approximation is achieved 
(this immediately implies that our version of fault-tolerant A;-median clustering, i.e. the minimization of 
has a O (max^gx approximation algorithm). In particular, the LP that was previously used for ^-median 
clustering naturally generalizes to this setting. This is not the case for our fault-tolerant version in 
Moreover, the local search techniques for /^-median clustering (see for example Q) do not seem to be easily 
generalizable to the case 7 > 1 , and in any case seem to require n^^J^ time, which is not polynomial even for 
moderate values of j. 

Arguing as above in the case of /:-median clustering we see that the fault-tolerant /c-median clustering 
problem in (|5]l is a monotone clustering problem. In Section 14.11 we show that it can be solved exactly in 
polynomial time on ultrametrics. Thus, in combination with Theorem |2j we obtain a 0^(log?i)^^ approxi- 
mation algorithm for the minimization of ^ on general metrics. 

Remark 1. Facility location type problems have been studied extensively since the 1960's — we refer to the 
book 133], and specifically to the chapter on uncapacitated facility location |[T5l . for a discussion of such 
problems. The uncapacitated metric facility location problem is closely related to ^-median problem (indeed 
^-median can be reduced to it via Lagrangian relaxation — see f24]), and has been studied extensively in 
recent years (see Cl2l[T9ll23ll24ll26l l36l). In the context of ^ we can also consider the following fault- 
tolerant version of the facility location problem. Assume in addition that we are given non-negative facility 
costs {fx}xex- Then the goal is to minimize over all xi, . . . , e X the objective function 



The case j(x) = 1 reduces to the classical un-capacitated metric facility location problem. The techniques 
presented here can be easily generalized to yield a O ((log n)^^ approximation algorithm for the minimization 
of ^ as well. 

Hip clustering. Another problem which illustrates the usefulness of Theorem |2]is the 'L{p clustering problem 
which we now describe. Our argument for this problem is quite general, and it applies to more cost functions, 
but it is beneficial to concentrate on a concrete example. For p € [l,oo] the Xlp clustering problem is as 
follows: For a metric space {X, d) and ^ € N the goal is to find xi , . . . , x^ e X and a partition of X into k sets 
Cy, . . . ,Ck Q X which minimize the objective function 



When p - \ this becomes the ^-median problem, and when p - 00 this is the "sum of the cluster 
radii" problem, which has been studied in Ill4l . In both of these extreme cases there is a constant factor 



k 




(6) 




(V) 
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approximation algorithm known, so we automatically get a o(mm[n^^P,n^ '^^)) approximation algorithm 

for dV]). Here we shall use the framework of Theorem [2] to give a o((\ogn)^^ approximation algorithm for 
this problem for general p. 

Observe that the Xtp clustering problems are monotone clustering problems. Indeed, all we need to do 
is define T{x, d, {(xi, Ci), . . . , (x^, C,„)}) to be oo if {Ci, . . . , Cm) is not a partition of X or m k. Otherwise 
set r{x,d, {(xi, Ci), . . . , {xt, Ck)}) = if X ^ {xi, . . . ,Xi:) and for j e {\, . . . ,k}, 



T{xj, d, {(xi , Ci), . . . , {xk, Ck)}) 



^ d{x, Xj)P 
\xeCj 



This definition clearly makes F a homogeneous monotone clustering cost function for any p e [l,oo]. 
The following lemma, combined with Theorem |2l therefore implies that the l,£p clustering problem has a 
o((log«)^) approximation algorithm. 

Lemma 3. The 'Ltp clustering problem has a constant factor polynomial time approximation algorithm 
(even a FPTAS) on ultrametrics. 



Lemma [3] will be proved via dynamic programming in Section |4~T] 



Proof of the upper bound in Theorem [I] 



We start by recalling some terminology and results concerning random partitions of metric spaces. Given a 
partition J^ofa finite metric space (X, dx) and x e Xwe denote by J^{x) the unique element of £P to which 
X belongs. For A > the partition ^ is said to be A-bounded if for every x € X we have diam(^(x)) < A. 
We also fix a positive measure ju on X. The following fundamental result is due to [17 1 when ju is the uniform 
measure on X. The case of general measures was observed in ||27[|30ll . and the specific numerical constants 
used below are taken from |[32l . 

Lemma 4. For every A > there exists a distribution over A-bounded partitions ^ ofX such that for every 
X e X and every < t < A/8, 

^ \6t u{Bx{x,A)) 

Pr [Bxix, t) <t &>{x)\ < — • log f ' ' . (8) 
A ^(Bx{x,Aj%)) 

We also recall the notion of a quotient of a metric space (see |[9lllll|3Tl)- Let W = {Wi, . . . , W,„) be a 
partition of X. For W,W' e W write dx(W, W) = mm{dxix,y) : x eW, y e W'}. The quotient metric space 
(X/W, dxiw) is define as follows. As a set XjW coincides with W . The metric dxjw is the maximal metric 
on W which is majorized by dx{-, •)■ In other words, for W, W € W, 

Im-l 
g dxiVj-i , Vj) : Vo,..., V,n-i e W, Vo - W, V,„-i = W \ 

Note that the V/s in the definition above need not be distinct. 

The following lemma is a well known "quotient version" of Lemma|4l The argument dates back at least 
to Bartal f3], and appeared in various guises in several other places — see for example ll22l[32l . Since we 
couldn't locate the formulation that we need in the literature, we include a proof here. 
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Lemma 5. Let {X, dx) be an n-point metric space and A > 0. Then there exists a distribution over A- 
bounded partitions 3^ ofX such that for every x,y e X, if dx{x,y) < ^ then !^(x) = ^(y), and for every 
X e X and < t < A/16, 

^ 32t u(Bx(x, A)) 

Pr [Bx{x, t) <i ^{x)\ < — • log ' 



A ^^i{Bx{x,Mm 

Proof. Define an equivalence relation on X by x ~ y if there exists k € N and xq, . . .,Xk e X such that 
X() - X, Xk - y and dx{xi-i,xi) < ^ for all / e {1, . . . , fc). Let W - IW\, . . . ,Wm] be the equivalence 
classes of this relation, and consider the quotient metric space XjW . We also denote hy n : X ^ W the 
induced quotient map, i.e. for x € Wj, n{x) = Wj. Let fi o be the measure on W given for W e by 
ju o n~\W) = fi(jT'^{W)). Observe that for every x,y e X, 

A 

dxix,y) - 2 - dxiw{T^{x),n(y)) < dxix,y). (9) 

Indeed, the upper bound in ^ is immediate from the definition of a quotient metric. The lower bound 
in (O is proved as follows. There are points x = xq, xi, . . . , x,„^\ = y in X such that dx/wi^i^), ^(y)) = 
Y!'j=\ dx(^{xj-i), n{xj)). For j € |1, . . . ,m - 1) let aj € ;r(xy_i) and bj € nixj) be such that dx{aj,bj) - 
dxin{xj-i),7T{xj)). Since, by the definition of the equivalence relation ~, for all z € X we have diam(7r(z)) = 
max.a,hen{z) dx{a, Zj) < ^, we get that 

m-l m-2 

dx{x,y) < dx{x,ai) + ^ dx{aj,bj) + ^ dx{bj,aj+i) + dx{bm-i,y) 

7=1 7=1 

A A A 

^2^+ dxiw(n{x), n{y)) + {n - 2)— + — , 

implying the lower bound in Q. 

Let =S be a distribution over A/2-bounded partitions of X/W such that for every W e W and every 
< f < A/16 we have 



32t_ no7:-\BxMW,A/2)) 
A ' °^//o7r-i(Bx/#'(W,A/16)) 



Pr [BxrHW, t) t ^{W)] < — • log ^_ „A (10) 



The existence of £2 follows from Lemma |4l Let ^ be the partition of X given by J2 = {n-\A) : A e 
Note that Q implies that for every x € X we have n^^ {Bxiw{n{x), A/2)) c Bx{x, A) and for every f > 0, 
{Bxiw{^{x), t)) 3 Bx{x, t). Thus ([TOl) imphes that for every x € X and < f < A/16, 

Pr [Bxix, t) t ^(x)] < Pr [Bx/#-Wx), t) <t ^(7r(x))] < ^ • log ■ 



A ° p{Bx{x,Am) 

It remains to note that Q implies that ^ is A-bounded and if dx(x, y) < then x ~ y, which means that 
7r(x) = Ti(y), so that ^(x) = ^(y). □ 

Proof of the upper bound in TheoremUl For every e Z let be a random partition sampled from the 
distribution over partitions of X from Lemma |5] with A - 16^^, where is the counting measure on X (we 
assume in what follows that the distributions for different values of k are independent). For x,y e X let k 
be the largest integer for which ^yt(x) ^kiy) (such a k must exists since for small enough k we have 
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^^(z) = {z} for all z e X). Denote p(x,y) = 16^+i. Then p is a (random) ultrametric on X. Indeed, if 
x,y,zeX and p{x,y) = le'^^' then ^k{x) + ^k(y)- It follows that either ^uiz) + 3^k{x) or ^k{z) + ^k(y)- 
Thus by the definition of p we have that max{p(x, z),p(y,z)) > p{x,y). Note also that if p{x,y) = 16'^'''^ then 
^k+\ix) = ^k+i(y), so that dx(x,y) < diam(^(x)) < 16'''^^ = p(x,y). It follows that the identity mapping 
on X is a random non-contractive embedding of X into the ultrametric (X,p). Finally, since whenever 
dx{x,y) < we have ^k{x) = ■^kiy), we are ensured that p{x,y) < 32ndx{x,y) for every x,y e X. 

Denote for ;c e X and / e Z, Ai{x) ^ Bx{x, 1 6') \ Bx{x, 1 6'" ^ ). For every j € N and ;t e Z if Bx{x, 1 6'''^) c 
^<:(x) then for every y e Bx{x, 16'^"-') we have ^it(x) - ^k(y), and therefore by the definition of p{x,y) 
we have p{x,y) < 16^^. Thus, if 3^ e Ai^^jix) we have p(x,y) < 16^^ < 16^^^(ix(-^5 3')- This establishes the 
following inclusion of events: 



pix,y) 

yikT-jix) dx{x,y) 



max 



> 16^+^ 1 c {Bx{x, I6'~j) t ^kix)} . 



hence 



Pr 



p{x,y) 



max 

yeAk-j(x) dx{x, y) 



> 16^+' 



< Pr[%(x, l6'-j) t ^kix)] < • log 



\Bx{x, 16^)1 
W 16*^-1)1" 



Thus, since X - \JifziAi{x), we see that 



Pr 



max > W 

y€X\{x\ dx{x,y) 



= Pr 



U 



pix,y) ^ 
max -r- > 16-' 



yeAiix) dxix, y) 



p(x,y) ^ , ,/ 
max > 16^ 

yeAi(x) dx{x,y) 



V 32 \Bxix,W^^-')\ , 512 



i'eZ 



\Bx{x, 16'+^-2)| 16^ 



It follows that there exists a universal constant C > such that for all m > we have 

Clogn 



Pr 



P(x,y) 
max -r- > u 



_yeX\lx] dx{x,y) 

Hence, using the a priori bound p{x,y) < 32ndx{x,y), it follows that 



p{x,y) 
y'ix\{x] dx{x,y) 



max 



p32n 

Pr 

Jo 



pix,y) ^ 
max > u 

yeX\{x\ dx{x,y) 



du < 



i 



min < 1, 



Clogn 



du = 0[l + (lognfy 



This completes the proof of the upper bound in Theorem [T] 



□ 



Remark 2. The above argument also shows that for every «-point metric space (X, dx) there exists a distri- 
bution over non-contractive embeddings into ultrametrics f : X ^ U such that 



Ei^|V/(x)U = 0(1 + (log?i)logO(X)) , 
where 0(X) is the aspect ratio of X, which is defined by 

diam X maX;t,vex dx{x, y) 



0(X)- 



min.v,vex dxix, y) min^i^ygx dx{x, y) ' 

x+y x+y 
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3 Tight lower bounds for cycles, paths, and diamond graphs 



As mentioned in the introduction, the metrics F„ in Theorem [T] are the diamond graphs of Newman and 
Rabinovich 041 . which will be defined presently. Before passing to this more complicated (and strongest) 
lower bound, we will analyze the simpler examples of cycles and paths, which are of independent interest. 

Let C„, « > 3, be the unweighted path on «- vertices. We will identify C„ with the group Z„ of integers 
modulo n. We first observe that in this special case the upper bound in Theorem [U can be improved to 
0{\ogn). This is achieved by using Karp's embedding of the cycle into spanning paths — we simply choose 
an edge of C„ uniformly at random and delete it. Let / : C„ — > Z be the randomized embedding thus 
obtained, which is clearly non-contractive. 

As Karp observed, one can readily verify that as a probabilistic embedding into trees / has distortion at 
most 2. We will now show that as a maximum gradient embedding, / has distortion 0(log n). Indeed, fix 
X € Cn, and denote the deleted edge by [a,a + 1). Assume that dc„{x,a) - t < n/2 - 1. Then the distance 
from a + I to X changed from f + 1 in C„ to « - f - 1 in the path. It is also easy to see that this is where the 
maximum gradient is attained. Thus 



E|V/(x)U^- y "/ / = @i\ogn). 

0<t<n/2 



n „ ■^.^ t + 1 



We will now show that any maximum gradient embedding of C,, into a distribution over trees incurs distor- 
tion f2(log n). For this purpose we will use the following lemma from ll35il . 



Lemma 6. For any tree metric T, and any non-contractive embedding g : C„ ^ T, there exists an edge 
{x, X -I- 1) ofCfi such that dT{g{x),g{x -i- 1)) > | - 1. 

Now, let ^ be a distribution over non-contractive embeddings of C„ into trees f : Cn ^ T. By Lemma[6] 
we know that there exists x € C„ such that dTif{x),f{x + 1)) > Thus for every y e C„ we have that 
max{dTif(y),f{x)),dTif(y),fix+l))} > Ontheotherha.ndmax{dc„(y,x),dc„(y,x+l)} <dc„ix,y) + l. 
It follows that 

6dc„{x,y) + 6 

Summing this inequaUty over y e Cn we see that 

y€C„ Q<k<n/2 ^ " 



Thus 

y€Cn n 



maxE^|V/(3;)U > - V E^|V/Cy)U = n(log?i), 

y&c„ 

as required. 



We will now deal with the more complicated case of maximum gradient embeddings of the unweighted 
path on ^-vertices, which we denote by Pn, into ultrametrics. The following proposition shows that Theo- 
rem [T]is optimal when one considers embeddings into ultrametrics. This is weaker than the lower bound in 
Theorem [H which deals with embeddings into arbitrary trees (note that /"„ is a tree). 

Proposition 7. Let Si be a distribution over non-contractive embeddings ofPn into ultrametrics f : Pn ^ U. 
Then there exists x e Pn such that Eg^l V/(x)|oo = 0((log n)^). 
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Before proving Proposition |7] we record the following numerical inequalities. 
Lemma 8. The following elementary inequalities hold true: 
1. For every a,b e {0, 1,2,.. .}, 



a(log af + b{log bf > (a + b) (log(a + b)f - 2 



1 +log 



a + b 



alog(a + b). 



2. For every x> 1, (1 + log x) log x < 4 sfx. 

Proof. The first inequality is trivial if a = or = 0, so assume that a,b > 1. Denote for t > 0, il/{t) - 
t{\og tf. Then 



{a + b) (log(a + b)f - bilog bf 



•a+h 



il/'{t)dt 



- f 

Jh 
/-•a+h 

= I [(log tf + 2 log dt 

Jh 

< a (log(a + b)f + 2a log(a + b) 



a(log af + a [log{a + b) + log a] ■ log 



a + 1 



+ 2a log (a + b) 



< a{\og af + 2 



1 + log 



a + b 



a log{a + b). 



proving the first assertion in Lemma [8] 

The second assertion in Claim [8] follows from the inequality log x < 2 ^ - 1 , which is true since the 
minimum of the function j i-^ 2 - 1 - logy, which is attained at 3^ = 16, is positive. □ 



Proof of Proposition^ We think of P„ as the interval of integers / = {0, . . . - 1) c R. Arguing the same 
as in the case of the cycle C„, it is enough to prove that if {U,du) is an ultrametric and / : f „ — > ?7 is 
non-contractive then 



x)|eo > c(log n) , 



(12) 



A-0 



where c > is a universal constant. 

Given a sub-interval J = {a,a-\-\, . . . , a-\-t\ c |0, 1) let be the largest point m € \a-\-\, . . . , a-\-t\ 
for which dij{f{m - \),f{m)) = \\f\j\\up = maxi<,<, t/(/(/(a -1- / - l),/(a -1- /)) (if f = then we set my = a). 
Since the distortion of J in any embedding into an ultrametric is at least |7| - 1 (see Lemma 2.4 in 13111 ). we 
know that du(f{m j - 1), f{m j)) > t - \J\-\. We shall denote in what follows 7^ to be the shorter of the two 
intervals {a,a + I, . . . ,mj - \] and {mj, . . . ,a + t) (breaking ties arbitrarily), and 7^ will denote the longer of 
these two intervals (when |/| = 1 we use the convention Js - Jh)- Thus 7 = U J}, and 1/^1 < \ Jb\. Finally, 
let ;cy be the point in 7, which is closest to Jh (so that xj e {mj,mj-\]). 

We define a function : 7 ^ R inductively as follows. If 1 < 17^1 < Vi^ then 



8jM) if X e \ {xj}, 

i [1+ log (i^)] 17,1 log |/| ifx^xj, 
gJiM) ifxeJh. 



(13) 
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If, on the other hand, |7s| > VfTf then 



gj^x) iixe /, and \x - xj\ > ViXi, 

if ^ € J, and \x - xj\ < im, (14) 
gJtix) iixe Jb. 



The following claim summarizes the crucial properties of the these mappings. Recall that we are using 
the notation / = |0, 1). 



Claim 9. The following assertions hold true for every sub-interval J Q I. 



1. For every x e J we have gj{x) < |V(/|j)(x)|oo = maxj,gj\jy 



2. For every x e J, gj{x) < \ J\ - 1. 

3. If\Js\ > V7 and \x - x^l < ^ then gj^x) < 4 

Proof. The proofs of all of the assertion in Claim |9] will be by induction on J. To prove the first assertion 
assume first that 1 < 17,1 < y/\T\. From the recursive definition in ([T3] ) it follows that we should show 
that I [l + log^jj^i^j |/s|log|7| < \^if\j)ixj)\co- Since xj e {mj - l,mj\ the definition of nij implies that 
|V(/l7)(xy)|oo > |7| - 1. Thus it is enough to show that ^ (1 + log |/|) vT/flog |7| < |7| - 1, which follows 
from the second assertion in Lemma [8l If, on the other hand, |/vl > VT/f then from the recursive definition 
in (fT4l) it follows that it is enough to show that for every x e JsWe have ^-l^u^ ^ |V(/l7)Wloo- But since U 
is an ultrametric we know that 



|7| - 1 < duifimj - l),f{mj)) < max{du{f{x), f{mj - \)),du{f{x),f{mj))}, 

which implies the required lower bound on |V(/|7)(x)|oo since xj € {mj - l,mj}. The second assertion in 
Claim |9] is proved similarly. 

It remains to prove the third assertion in Lemma |9] Let K c be the sub-interval of in which 
the value of gj^ix) was first set. In other words, ^ c 7^ is the smallest interval for which x € and 
gKix) = gj,ix). It follows in particular that \x - xk\ < '^IW^V Also, by construction it is always the case that 
either Ks or K}, is contained in the interval \ymn{xK, xj},r[i&x{xK, xj}]. Since is shorter than Kj, we are 
assured that 

\Ks\ < \XK - xj\ < \XK -x\ + \x- Xj\ < ^\ +^\< 2^\. (15) 
If l^r^l < Vl^ then necessarily x = xk and gK{x) was determined by the second line in (fT3] ). Hence 

8jXx) = 8k{x) = ^ 

where we used ([T5l) and the last inequality in ([T6l ) follows from the second assertion of LemmaHl 
Otherwise l^^l > Vi^ and gK{x) was determined by the second line in (fT4l ). i.e. 

gjM) = 8k{x) - , , < 1^1 < \K,f < 4^/\JA, 

\x - xk\ + I 

where we used (fTSl) . This completes the proof of Claim|9l □ 



\K,\ log \K\ < ^ [1 + log 17,1] ^log 17,1 < 4 ^ 



(16) 
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With Claim |9] at hand we are in position to conclude the proof of Proposition |7] We will prove by 
induction on |7| that 



J]gjix)>c\mog\j\f. 



(17) 



xej 



This will prove (fT2l) . and hence imply Proposition |7l since by the first assertion of Claim|9]we get that 



n~l 



|V/(x)U > 2 Siix^ ^ cnilognf. 



x=0 



xel 



Inequality ([TTl) trivially holds true with small enough constant c if |7| < 2^^, so assume that |7| > 2^. To 
prove ([TT] ) we distinguish between two cases. If < VT/f then since gj^{xj) < \ Js\ (by the second assertion 
in Claim |9l) we see by induction that 



^ gjix) ^ ^ gj, (x) + ^ gj, (x) + gj{Xj) - gj^ {Xj) 



xej 



xej, 
> c 



xeJt 



(|7,|(log 17,1)2 + \Ji,\{\og \h\f) + 2 



l.logl^ 



> c|7|(log|7|)2-2c 

> c|7|(log |/|)2, 



l.log|i^ 



l^|log|/| + 



|/,|log|/|-|/,| 

l^|log|/| 



1 + log — 



\Js 



(18) 

(19) 
(20) 



where in (fTSl ) we used the inductive hypothesis and the inductive definition in ([T3l) , in ( fT9l ) we used 
Lemma m and holds for c < j. 
On the other hand if l/J > VPl then 



x€j xeJ, xeJt xeJs ^ ' 



(21) 



xeJs 



> c|7|(log |7|)2 - 2c 



> c|7|(log|7|)2-2c 



> c|7|(log|7|)2-2c 

> c|7|(log|7|)2, 



1 + log (H]] log |y| + y ^ _ 8I7//4 (22) 



1^1 



W log |/| + -(171-1) log 17,1 -8|7p/4 
|7,|log|7| + ^(|7|- l)log|7,| 



(23) 
(24) 



where in (|2T]) we used the inductive definition in (fT4l ). in (l22l) we used the inductive hypothesis, Lemma [8] 
and Claim |9l and inequalities (l23l) and (l24l ) hold for |7| > 2^*^ and small enough c, respectively, since 
Y < |7,| > VfTf. This completes the proof of Proposition |7j □ 

We now pass to the proof of the lower bound in Theorem[T]in its full strength, i.e. in the case of maximum 
gradient embeddings into trees. We start by describing the diamond graphs {Gjtl^j, and a special labelling 
of them that we will use throughout the ensuing arguments. The first diamond graph Gi is a cycle of length 
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4, and G^+i is obtained from by replacing each edge by a quadrilateral. Thus Gj; has 4*^ edges and ^^3-^ 
vertices. As we have done before, the required lower bound on maximum gradient embeddings of Gt into 
trees will be proved if we show that for every tree T and every non-contractive embedding / : G/t ^ T we 
have 

eeE(Gk) xee 

Note that the inequality (l25l) is different from the inequalities that we proved in the case of the cycle and 
the path in that the weighting on the vertices of Gk that it induces is not uniform — high degree vertices get 
more weight in the average in the left-hand side of (1251 ). 

We will prove (1251 ) by induction on k. In order to facilitate such an induction, we will first strengthen 
the inductive hypothesis. To this end we need to introduce a useful labelling of Gk- For \ < i < k the 
graph Gk contains 4*^"' canonical copies of G,-, which we index by elements of {1,2,3,4}*^"', and denote 

\g^!^\\ , .. These graphs are defined as follows — see Figures 1 and 2 for a schematic description. 

I Mlo-el 1,2,3,4)*-' ^ ^ ^ ^ 




Figure 1: The graph G2 and the labelling of the canonical copies of Gi contained in it. 




Figure 2: The graph G3 and the induced labelling of canonical copies of Gi and G2. 

Formally, we set Gjg^ = Gk, and assume inductively that the canonical subgraphs of Gk-i have been 
defined. Let Hi, H2, H^, H4 be the top-right, top-left, bottom-right and bottom-left copies of Gk-i in Gk, 
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respectively. For a e {1,2,3,4}*^ ' ' and j € {1,2,3,4} we denote the copy of G, in Hj corresponding to 

[a] J [ja] 

-t-M^tT^W rW 7-W 



For every 1 < / < )t and a € { 1 , 2, 3 , 4)*^"' let rj^^^ , , , K-^^ be the topmost, bottom-most, left-most, 

and right-most vertices of G^^j, respectively. We will construct inductively a set of simple cycles 'tf[a] in G^^j 
and for each C € 'if [a] an edge ec € E ("^[q.]), with the following properties. 



1. The cycles in '^[a] are edge-disjoint, and they all pass through the vertices rj^j,5|^^^,L|^^^,/?|^j. There 
are 2'"' cycles in 'if[a], and each of them contains 2'^' edges. Thus in particular the cycles in "^[^j form 



a disjoint cover of the edges in G^^^ 



[ay 

2. If C € <^[,] and sc = {x,y} then dT{f{x),f(y)) > ^ - 1. 

3. Denote E[a;] - |ec '■ C € '^[a}} and A; = U^ei 1,2,3,4)* ' ^M- The edges in A, will be called the 
designated edges of level /. For a € {1,2,3,4}*^"', C e '^[a] and j < i let Ay(C) = Aj n E{C) be the 
designated edges of level j on C. Then we require that each of the two paths - - B^^^^ and 

~ ^fd] ~ ^fd] ^ contains exacdy 2'-^'-^ edges from A/C). 

The construction is done by induction on /. For / = 1 and a e {1,2, 3,4)^"^ we let 'tf[a] contain only the 
4-cycle g|^^, itself. Moreover by Lemma[6] there is and edge e^w € eIg^^.) such that if = {x,y} then 

dT{f{x),f(y)) > ^. This completes the construction for / = 1. Assuming we have completed the construction 
for / - 1 we construct the cycles at level / as follows. Fix arbitrary cycles Ci € ^[i^j, C2 € '^[2a], C3 e '^[3a], 
C4 € 'ta[4a]- We will usc thcsc four cycles to construct two cycles in ^[q,]. The first one consists of the 
'^[a] ~ ^[a] P^'-'^ ^1 which contains the edge eci > the r''^^^ - B^^j^ path in C3 which does not contain the 
edge > the b|^j - L^^^^ path in C4 which contains the edge sc^ , and the - T^^^ path in C2 which does 
not contain the edge £C2- The remaining edges in E{Ci) U E{C2) U ^(Ca) U E{C4) constitute the second 
cycle that we extract from Ci, C2, C3, C4. Continuing in this manner by choosing cycles from 'rf[ia] \ {Ci}, 
'^[2a] \ {C2}, ^[3a] \ {C3), ^[4q.] \ {C4} and repeating this procedure, and then continuing until we exhaust 
the cycles in 'tf[ia] U ^[2a] U ^[3a] U ^[4a]> we obtain the set of cycles 'rfa- For every C e "^q, we then apply 
Lemma [6] to obtain an edge sc with the required property. 

For each edge e € E{Gk) let a € {1,2,3,4}*^"' be the unique multi-index such that e € ^^G^^^^. We 
denote by Ci{e) the unique cycle in ^[q,] containing e. We will also denote e^Ce) = ec,(e)- Finally we let 
€ e and € "e^Ce) be vertices such that 

dTif{aiie)),fibiie))) = max djimjib)). 

aee 
beei(e) 

Note that by the definition ofeiie) and the triangle inequality we are assured that 

1 /2'+' \ 2' 

dTifiai{e)),mie))) > - — - 1 > — . (26) 



Recall that we plan to prove (1251 ) by induction on k. Having done all of the above preparation, we are 
now in position to strengthen ( [25] ) so as to make the inductive argument easier. Given two edges e,h € G^ 
we write e h if both e, h are on the same canonical copy of G; in Gt, C,(e) = Ci(h) = C, and furthermore 
e and h on the same side of C. In other words, e h if there is a € {1, 2, 3, 4)*^"' and C € ^[q,] such that if 
we partition the edges of C into two disjoint T^^^ - B^^j^ paths, then e and h are on the same path. 
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Let m e N be a universal constant that will be specified later. For every integer i < k/m and any 

a e {1,2,3,4}'=-'"^ define 



max 



^r(/fam(e)),mm(g)))A2" 



We also write Lc = mm^f,^i 2 3 41*-™'' ^^C^^)- We will prove that Lc > Li-\ + c(, where c > is a universal 
constant. This will imply that for £ = [fc/mj we have L{ = Q.{k^) (since m is a universal constant). By simple 
arithmetic (|25T ) follows. 

Observe that for every a e {1, 2, 3, 4}^^'"^ we have 



- y 

Am / 1 



1 



dT{f{aiAe)),f{bM{e)))A2" 

max ; — 

is|i,.../l dr„(e,e,\„(e)) + \ 

^,11,2,3,41- «£(G'» ),J„.S;,;(,) 



- y 

Am / 1 



/36| 1,2,3,4}' 



4.m({-l) 
1 

4m(f-l) 



max 



^/r(/(a™(e)),/(^™(^')))A2" 



i£{i,...,f-i| dGi,{e,eim{e)) + I 



- y 

^mt Z—l 



max < 



eeE[G 



dG^{e,ec^{e)) + 1 



max 



dT{f{au„{e)),f{bUe)))A2" 



i6{i,_^-i| dG;^{e,eini{e)) + \ 



/3e{ 1,2,3,4)"' 



- y 



max < 



eeElG 



max 



^/r(/(«™(^')),m>n(^')))A2" 



- y 



max < 



(£{i,_^-i| dGt{e,ei,„{e)) + I 

dT{f{a(m{e))J{be,n{e)))M''" 
dG,{e,7Ue)) + l ■ 

^/r(/(«™(^')),m>n(e)))A2" 



max 

ie{l,.^f-l) 



dG^{e,eim{e)) + 1 
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Thus it is enough to show that 

def 1 



eeElc' 



max< 



^ dT{f{aUe)),f{bem{e))) Al'"" ^ 
U, : — : — — — — : • 1(, 



-■emetmie)] 



- max 



dT{f{aiAe)),f{bi,n{e)))hr 



m,..j-\] do I, {e, Cim (e)) + 1 



Q.{0- (27) 



To prove dTTl ) denote for C € "^[q,] 
e e E{C) : sc e and 



dT{f{ai,„{e))J{bM)) AT'" ^ 1 dT{f{aUe)),f{bUe)))M 

max — > - • — 

/E|i,.^f-il dG,^{e,eim{e)) + \ 2 dG,,{e,ee,„{e)) + I 



tm 



Then using (1261) we see that 



A > 



— y y 



dj(^f{aimie))J{bimie)y) A 2 



tm 



1 y y dT{KaUe))J{be,n{e))) M"" l_ y y 



dTifiaemie)), f{b{mie))) A 2 



1 ^ ^ 2™' 1 ^ ^ 

, Ami 

Mi 9 . 4""" 



dT{f{aUe)),f{be,n{e)))M''" 



dGkie,eem{e)) + 1 



dT{f{ae,n{e)),f{bUe))) Al^"^ 



= nl-^-ma,\-2"'^ -mA — 3— y y 



dT{f(af,„(e)),f{bUe))) Al^"" 



(28) 



To estimate the negative term in (1281 ) fix C € '^[a] ■ For every edge e e S c (which implies in particular 
XhsX'etmie) - sc) we fix an integer / < £ such that e -^im 'e'imie) and 



2"" ^ dT{f{aim{e))J{bin,{e))) A 2"" ^ 1 dT{f{aUe))J{bt,n{e))) A 2^'" 

dGi,{e,7i,n{e)) + 1 ~ dGt{e,7im{e)) + 1 ~ 2 <iGt(e,'?fm(e)) + 1 

1 2^™ 
> , 

12 dG;,{e,sc) + \ 

or 

dG,{e,7im{e)) + 1 < 2('-^)'"+4 [^/^X^", ec) + 1] ■ (29) 
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We shall call the edgee^mCe) the designated edge that inserted e into Sc- For a designated edge e e E{C) of 
level im (i.e. e € A,„,(C)) we shall denote by Sc{e) the set of edges of C which e inserted to Sq- Denoting 
= dc^ie, Ec) + 1 we see that ( [291 ) implies that for e € (oc(£) we have 

\D, - [dcM^ ^c) + 1] I < 2('-^)'"+4 [Jg,(^?, £c) + 1] ■ (30) 

Assuming that m > 5 we are assured that 2^'"^)'""'"^ < i. Thus (l30l) implies that 



< dG,{e,Ec) + 1 < 



Hence 



dG,(e,7c„Ae)) + I - 2-i 2-i Zj c/g, (e, er) + 1 

eeSc * (=1 £eA„„(C) eSfTcCe) * 



^-1 o^m 

Z Z T 

/=1 £eA„„(C) -f-T 



1 



1 + 2('-^)'"+4 



0(l)-2^'"|]|A™(C)|-log|i^ 
0(1) • 2^"'{ ■ 2(^-')'" • 2('"^)'" = 0(1) • 2^""; 



Thus, using ( [281 ) we see that 



A = a(mO - 0(1) • ^ • I^mI 2""^^ = a{m£) - 0(1)^ = 0(0, 

provided that m is a large enough absolute constant. 

This completes the proof of the lower bound in Theorem [T] 



4 Monotone clustering problems 

In this section we give some examples which illustrate how certain monotone clustering problems can be 
solved efficiently on ultrametrics. Our arguments are quite flexible, and apply in more general situations. 
Before passing to these algorithms, we make a few general remarks on the framework for monotone cluster- 
ing that was discussed in the introduction. 

In the definition of monotone clustering we required that T{x, d, P) is homogeneous in d. One might 
wonder whether it is possible to consider also higher orders of homogeneity, i.e. clustering cost functions T 
which satisfy T{x, Ad, P) = /lT(x, d, P) for some p > I (this occurs, for example, in the A;-means clustering 
problem, where the goal is to find k "centers" that minimize the sum over the data points of the squared 
distance to the closest center). For the proof of Theorem [2] to work in this setting we need a distribution 
over non-contractive embeddings into ultrametrics f : X ^ U with a polylogarithmic upper bound on the 
expected value of |V/(;c)|^. Unfortunately, this is impossible to achieve in general. Indeed, let / : C„ ^ T 
be a random non-contractive embedding of the w-cycle into trees. Lemma[6]implies that there exists an edge 
(x, x+\)e E{Cn) for which dT{f{x),f{x +l))>'j-\. Thus 

2 dT{f{x),f{y))P > 

[x,y]eE(C„) 
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Taking expectation we see that 



max Wf{x)t > - y Wf(x)t > 

We note, however, that the proof of Theorem|2]used the homogeneity of F in a weak way. In order to get 
a polylogarithmic reduction to ultrametrics is enough to assume, for example, that for every i > 1 we have 
Y{x, Ad, P) = (polylog(«)) • A ■ Y{x, d, P). 

Our second remark concerns the fact that the solution space for monotone clustering problem that was 
presented in the introduction was 2^^^ . This is a huge space, and as we have seen in Section [TTTl by setting 
the clustering cost function to be oo on certain possible clustering solutions it is possible to reduce the size 
of this space. Additionally, in the arguments is Section [T7T] the cost function F ignored the structure of the 
solution space. Thus in a more generic formulation of the monotone clustering framework we can assume 
that the solution space is some abstract finite set S{X). For example, in our version of the fault-tolerant 
fc-median problem we can take the solution space to be 

4.1 Monotone clustering on ultrametrics via dynamic programming 

We now pass to the design of some monotone clustering algorithms on ultrametrics. It is a standard fact (see 
for example |6|) that any ultrametric {U, du) can be represented as follows. There is a graph theoretical tree 
T - {V,E) such that U is the set of leaves of T. The vertices of T are labelled by A : V — > [0, oo) and for 
every u,v e U snq have du{u, v) - A(lca(M, v)), where lca(M, v) is the least common ancestor of u and v in T. 
We may, and will, assume in what follows that every vertex of T is either a leaf or has exactly two children. 

We begin by showing that the fault-tolerant version of the ^-median problem described in can be 
solved exactly on ultrametrics. 

Lemma 10. The minimization of the objective function in ^ can be solved exactly on any n-point ultramet- 
ric in time 0{kn^). 

Proof. Let {U,du) be an ?i-point ultrametric and let T - (V, £) be a binary tree with vertex labels A : V ^ 
[0, oo) which represents U. We also assume that we are given fault-tolerant parameters {7(M)}„e{7. For every 
V € V let Ty denote the subtree of T rooted at v. Define for v € V and s e {Q, . . . ,k] 



cost*(v, s) - min 



^ du [x, x*(^)(x; du)) ■ xi, X2, . . . ,Xs e T,, n U 



(31) 



Our goal is to compute cost*(r, k), where r is the root of T. This will be done using dynamic program- 
ming. For any leaf u € U and s e {0, . . . , ^) define cost(M, s) = 0. Let v € V be an internal vertex with two 
children u,w e V. Define recursively 



cost(v, s) = min cost(M, t) + cost(w, s - t) 

te{0,...,s] L 



+ A(v) • {\{x eT^iDU : t< i{x) < s}\ + \{x eT„ r\U : s-t< j{x) < s}\)\. (32) 



A bottom-up computation of the dynamic program in ( [32l ) computes cost(v, s) naively in 0{kn^) time. 
We will be done if we show that cost(v, s) - cost*(v, s) for any v e V and s e {0, . . . ,^). The fact that 
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cost*(v, s) < cost(v, s) is obvious since (l32l) computes a feasible solution of (|3T]) (this fact is proved by a 
straightforward induction). 

We prove the reverse inequality by induction on the distance of v from the leaves of T. Let xi, . . .,Xs e 
TyCi U he such that 

cost* (v, 5)= ^ du[x,x*^^^{x;du))- 

Let u, w be the children of v in T. We may reorder the points so that for some t e {0, . . . , i^} we have 
{xi, . . . , X,) = r„ n {xi, Xs) and {Xf+i,. . . , x,} = n {xi, . . . , X,.}. Then 



cost* (v, 5)^ ^ J(/(x,x*(^)(x;(ij/)) 



;(.v)<i 



^ <ij/(x,x*(^)(x; J(7)) + ^ <i(/(x,x*(^)(x;(i(/)) 



x€T„nU 

j(x)<t 



xeT^,nU 
j(x)<s-t 



+A(v) • (|{x € r„ n ?7 : ? < 7W < 5)1 + |{x € n ?7 : s-t< j{x) < s}\) 

> cost*(M, + cost*(w, 5' - 

+A(v) • (|{x eTunU : t< j{x) < s}\ + \{x e n U : s-t< j{x) < s}\) 

> COSt(M, t) + COSt(w, S - t) 

+A(v) • (|{x eTunU : t< j(x) < s}\ + \{x e T„ n U : s-t< j(x) < s}\) 

> cost(v, s), 



(33) 

(34) 

(35) 
(36) 

where in (1331 ) we used the fact that the tree T represents the ultrametric (U, du), in (l34l ) we used the definition 
of cost*(M, f) and cost*(w, 5 - t) given by (|3TI ). in (l35l) we used the inductive hypothesis, and in (l36l) we 
used (I32I1. □ 



Our final result is the proof of Lemma [3l which yields a FPTAS for the S^p clustering problem on 
ultrametrics. We start with the following inequality. 

Lemma 11. Fix p >\ and assume that ai > ^2 > • • • > a„ > and bi,. . . ,b„ > 0. Then 
Proof. The proof is by induction on n, and the inductive hypothesis simplifies to 



^ n+l n+l' 



(37) 



Denote for x > 



+ X 
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Inequality (|37] ) is f(b'^^^) < /(O), so it is enough to prove that / is decreasing. But 

1 1 1 1 

-* ~ ~T~p x^n UP T~p ~ ~T~p \^ T~p ~ 

since ai > <3„+i. □ 

Proof of Lemma\3l Let {U,du) be an «-point ultrametric and let T - (V, £) be a binary tree with vertex 
labels A : V ^ [0, oo) which represents U. For v e V , { e {0, . . . ,k}, s e {0, . . . ,n} and t e [0, oo) define 
B*(v, €, s, t) to be the minimum cost according to ^ to cluster Ty n U using ^ sets and centers, when we are 
allowed to exclude s points from T^, n U, and the most costly cluster has cost t. 

We next define a "pseudo cost" B{v, {, s, t) inductively as follows. If v is a leaf then define B{v, 1, 0, 0) - 
B{v, 1,1,0) = B{v, 0, 1, 0) = 0, and for all other values of £, s, t we set B{v, i, s,t) = oo. When v has children 
u and w define: 



Biy, i, s, t) - min { B(u, ii,s\,ti) + B{w, I2, S2, ?2) 



«i,n,*2,''2e|0,...,il, 
f|,«2e[0,f], 
eie{0,..J\, 

+ « + r^Mvyf - n + (t'2 + nMvrf" - 12 : 

^ ' ^ ' s=s\-k-S2—r\—r2^ 

/=max{(r';+r2A(v)'')"'', (4'+ri A(v)'')'''''), 

With these definition we will prove the following claim by induction. 
Claim 12. For every v eT , € e {0, . . . ,k], s e {0, . . . ,n] and t e [0, 00) we have 

B*{y,e, s, t) = B{v,e, s, t). 

Assuming the validity of Claim[T2]for the moment, we conclude as follows. The dynamic programming 
algorithm described above does not suffice since the parameter t takes values in the range [0, 00), while we 
need it to take only poly(?i) values. We fix this issue using an argument which is based on ideas from 151. 

Normalize the distances in U so that the minimum distance is 1, and denote O = diam(?7). We can 
clearly assume that t < n<^. Assume first of all that we can ensure that t < A = 0(poly(«)). Once this is 
achieved then all we need to do is to apply a standard discretization procedure as follows. Fix an integer 
M > which will be determined presently and let A' = {0,A/M,2A/M, . . .,A). For t € [0,A] denote by 
rd(0 the rounding of t to its closest value in A'. We can now define a discretized dynamic programming 
procedure B'{v, £, s, t), where v, £, s take the same values as in the definition of B(v, £, s, t) and t € A'. This 
is done by defining as before for a leaf v eU B{v, 1, 0, 0) = B{v, 1,1,0) = B{v, 0, 1, 0) = 0, and for all other 
values of £, s, r setting B{v, i, s, t) = 00. When v has children u and w define: 
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( 



B'(v,£, s,t) - min 



+ B'iuJi, Sun) + B'{wj2,S2,T2) ■■ 



^i,''i,-S2,''2e{0,.. 

Tl ,T2eA', 

{ie{0,...,e], 

ri<si, 

s=si+S2-r\-. 



=rd(max{(rf+r2A(v)f ) (^+ri AW") J 



It is straightforward to check by induction that for any v e V,( e {0, . . . ,k], s e {0,...,n] and t e [0, A] we 
have 

4\T I 

\B(v, e, s, t) - B'(v, €, s, rd(0)| < 

M 

Since the optimal value of the E/'p clustering problem is at least 1 (excluding trivial cases), as this is the 
smallest distance in U, B' will yield an approximation algorithm for this problem whose multiphcative error 
is bounded by 1 + 0{n/M). Taking M = n/s for some s e (0, 1) we obtain the required PTAS. 

We therefore need to argue that we can ensure that t - 0(poly(n)). Recall that we can assume that t < 
n3). Let P = {(x\ , Ci), . . . , (xk, C^)} be the (yet unknown) optimal solution of the T.{p clustering problem with 
^-centers on U. Let h be the maximum length appearing in the solution, i.e. h — maxi<,<;t ni^;c€C, du{xi, x). 
Fix e e (0, 1) and define two "levels" of the tree T by 



and 



L-{veV : A(v) <h< A(parent(v))} , 



Q-\veV : A(v) < ^ < A(parent(v)) 



Let T' be the subtree obtained from T by deleting the subtrees {Ty \ {vW^^q, and let U' denote the leaves of 
T' . Equivalently, U' is obtained from U by contracting all distances smaller that sh/rp-. It is straightforward 
to check that costj//(P) < cost[/(P) < (1 + e)costj/'(P). 

Note that for every v e L the aspect ratio (i.e. the ratio of the diameter and the shortest distance) of 
n U' is at most n^/s. So, by the above reasoning (in the case of an a priori polynomial bound on t) we can 
approximate in polynomial time the value of B*(v, (, s, f) up to a factor 1 + 0{s). It remains to "glue" these 
approximate solutions to a solution of the Ef^ clustering problem on T . This is done by a (simpler) dynamic 
programming argument as follows. Denote by T the subtree of T' whose root is the same as that of T' and 
whose leaves are L. For v e T let C*{v,i) be the optimal solution of the ILip clustering problem on with 
{ centers and assuming that the largest distance appearing in the solution is at most h. We calculate C*(v, €) 
by dynamic programming: For v e L define C(v, f) = min, B*{v, 1, 0, i), and if v has two children m, w in T 
then 

C(v, t) - min {C(m, t{) + C(w, ^2) : ^1 e {0, . . . , t), tx+t2 = t). 

A straightforward induction shows that C*{y,t) = C{y,V). 

The only thing that is left to be explained is how to find the value h. This is done by exhaustive search: 
We try all the (2) possible values of h, do the above procedure for each of them, and take the minimum of 
the values that we get. 
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The proof of Lemma |3] will be complete once we prove Claim [12] We first note that B*(v, i, s, t) < 
B{v,€,s,t). This is true because B{-) represents a feasible solution of B*{-). The proof of this fact is by 
induction. If m, w € V are the children of v in T then there exist s\, S2, t\,t2, r\,r2, ^i, ^2 such that 



B{vJ, s,t) = BiuJi, siJi) + B{wJ2, S2,t2) + « + r2A{vrf'" -ti+ [t^ + riA(vf - t2, 

where si,ri,S2,r2 € {0, ti,t2 e [0,?], i\ e {0,...,^}, n < si, r2 < S2, s = si + S2 - ri - r2, 

e = + £2, and t = max |(?[ + r2A(v)'')'^^ , {t'^ + riA(v)'')'^''|. By the inductive hypothesis B{u, ii,Si,ti) 
and B{w,{2, S2,h) correspond to feasible solutions of B*{-) on r„ n U and n U, respectively. Hence 
B{v, €, s, t) corresponds to the following feasible solution: Take the union of the centers in r„ n ?7 and 
T^fMJ and retain all the current clusters in r„ n U and T^OU as is. Next add arbitrary ri unclustered points 
from r„ n U (from the pool of unclustered points that we are assuming exist in r„ n U) to the cluster with 
the most weight in T^. n U, and similarly add r2 unclustered points from DU to the cluster with the most 
weight in r„ n U. This creates the required feasible solution. 

We next prove by induction that B*{v,£,s,t) > B{v,€,s,t). Consider the clustering solution at which 
B*{v, €, s, t) is attained. It corresponds to s excluded leaves yi, . . . ,ys e Ty n U, k "centers" xi, . . .,X( e 
iTynU)\{yu---, ys) and a partition |Ci , . . . , C^) of (r,, n U) \ {yi, . . . ,ys} such that 

e ( 

j=l \xeCj 

Moreover, assuming without loss of generality that 

d{x, xiY = max d{x, xjY 



xeC 



xeCi 



the definition of B*(v, €, s, t) guarantees that 



Y,d{x,Xi)P 



\xeCi 



- t. 



By reordering the points we may assume that xi, . . . ,Xf^ e Tu and xt^+l, 
£). Denote 



( ^1 1 






( h+C^ \ 






- r2 and 


U 


V;=i ) 









nr,. 



. , Xf,+f2' ^ (recall that {i+€2 - 



Finally, we may assume that 



and 



Denote 



V dix,X])'' = max V d{x,Xj)'', 
^ /e{l,.../il ^ 

def d{x,X{.j,\y - max V d{x,XjY. 



xeCr^+inT,, 



xeCjnT„. 



n Tvi, and A„ 



U o 



23 



We also write ^'i - \{yi,. . .,ys} n r„| + n and S2 = \{yi, . . .,ys} C) T„\ + r2, so that s = si + S2 - ri - r2. 
Note that by definition 



2] cKx,xjr 

j=l \xeCjnT„ 



> B\uJi,Si,ti), 



(38) 



and 



> B*{w,i2,S2,t2)- 



(39) 



Thus 



d{x,Xj)P + \Cjf^AjmP 

xeCjDT,, 



1/P 



2] J(x,Xy)P + |CynA„|A(v)^ 



> B*(M, ^1, Si, fi) + B*(w, €2, S2, t2) + (t1 + r2A(v)P)"^ - ?i + (f^ + n^{v)Pf" - t2 



(40) 



> B{u, iusuh) + B(w, £2, S2, t2) + (ff + r2mpf'' -h+ [tP + n A(v)p)"'' - t2 (41) 

> B(v, ^, s, 0, (42) 

where in (l40l) we used Lemma [TT] together with (1381 ) and ( [39l ). in (|4T]) we used the inductive hypothesis, and 
in (|42l) we used the definition of B{-). This completes the proof of Lemma[3] □ 
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